Learning vector quantization for proximity data

نویسنده

  • Daniela Hofmann
چکیده

Prototype-based classifiers such as learning vector quantization (LVQ) often display intuitive and flexible classification and learning rules. However, classical techniques are restricted to vectorial data only, and hence not suited for more complex data structures. Therefore, a few extensions of diverse LVQ variants to more general data which are characterized based on pairwise similarities or dissimilarities only have been proposed recently in the literature. In this contribution, we propose a novel extension of LVQ to similarity data which is based on the kernelization of an underlying probabilistic model: kernel robust soft LVQ (KRSLVQ). Relying on the notion of a pseudoEuclidean embedding of proximity data, we put this specific approach as well as existing alternatives into a general framework which characterizes different fundamental possibilities how to extend LVQ towards proximity data: the main characteristics are given by the choice of the cost function, the interface to the data in terms of similarities or dissimilarities, and the way in which optimization takes place. In particular the latter strategy highlights the difference of popular kernel approaches versus so-called relational approaches. While KRSLVQ and alternatives lead to state of the art results, these extensions have two drawbacks as compared to their vectorial counterparts: (i) a quadratic training complexity is encountered due to the dependency of the methods on the full proximity matrix; (ii) prototypes are no longer given by vectors but they are represented in terms of an implicit linear combination of data, i.e. interpretability of the prototypes is lost. We investigate different techniques to deal with these challenges: We consider a speed-up of training by means of low rank approximations of the Gram matrix by its Nyström approximation. In benchmarks, this strategy is successful if the considered data are intrinsically low-dimensional. We propose a quick check to efficiently test this property prior to training. We extend KRSLVQ by sparse approximations of the prototypes: instead of the full coefficient vectors, few exemplars which represent the prototypes can be directly inspected by practitioners in the same way as data. We

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Vector Quantization for proximity data

Semi-supervised learning (SSL) is focused on learning from labeled and unlabeled data by incorporating structural and statistical information of the available unlabeled data. The amount of data is dramatically increasing, but few of them are fully labeled, due to cost and time constraints. This is even more challenging for non-vectorial, proximity data, given by pairwise proximity values. Only ...

متن کامل

Adaptive conformal semi-supervised vector quantization for dissimilarity data

Semi-Supervised Learning Proximity Data Dissimilarity Data Conformal Prediction Generalized Learning Vector Quantization Existing semi-supervised learning algorithms focus on vectorial data given in Euclidean space. But many real life data are non-metric, given as (dis-)similarities which are not widely addressed. We propose a conformal prototype-based classifier for dissimilarity data to semi-...

متن کامل

NGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map

Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...

متن کامل

Hierarchical Interpretation of Human Activities Using Competitive Learning

In this paper we describe a method of learning hierarchical representations for describing and recognizing gestures expressed as one and two arm movements using competitive learning methods. At the low end of the hierarchy, the atomic motions (“letters”) corresponding to flow fields computed from successive color image frames are derived using Learning Vector Quantization (LVQ). At the next int...

متن کامل

The Time Adaptive Self Organizing Map for Distribution Estimation

The feature map represented by the set of weight vectors of the basic SOM (Self-Organizing Map) provides a good approximation to the input space from which the sample vectors come. But the timedecreasing learning rate and neighborhood function of the basic SOM algorithm reduce its capability to adapt weights for a varied environment. In dealing with non-stationary input distributions and changi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016